Finding Optimal Pairs of Patterns
نویسندگان
چکیده
We consider the problem of finding the optimal pair of string patterns for discriminating between two sets of strings, i.e. finding the pair of patterns that is best with respect to some appropriate scoring function that gives higher scores to pattern pairs which occur more in the strings of one set, but less in the other. We present an O(N) time algorithm for finding the optimal pair of substring patterns, where N is the total length of the strings. The algorithm looks for all possible Boolean combination of the patterns, e.g. patterns of the form p ∧ ¬q, which indicates that the pattern pair is considered to match a given string s, if p occurs in s, AND q does NOT occur in s. The same algorithm can be applied to a variant of the problem where we are given a single set of sequences along with a numeric attribute assigned to each sequence, and the problem is to find the optimal pattern pair whose occurrence in the sequences is correlated with this numeric attribute. An efficient implementation based on suffix arrays is presented, and the algorithm is applied to several nucleotide sequence datasets of moderate size, combined with microarray gene expression data, aiming to find regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing certain genomic functions.
منابع مشابه
Finding Optimal Pairs of Cooperative and Competing Patterns with Bounded Distance
We consider the problem of discovering the optimal pair of substring patterns with bounded distance α, from a given set S of strings. We study two kinds of pattern classes, one is in form p ∧α q that are interpreted as cooperative patterns within α distance, and the other is in form p ∧α ¬q representing competing patterns, with respect to S. We show an efficient algorithm to find the optimal pa...
متن کاملSome Results about the Contractions and the Pendant Pairs of a Submodular System
Submodularity is an important property of set functions with deep theoretical results and various applications. Submodular systems appear in many applicable area, for example machine learning, economics, computer vision, social science, game theory and combinatorial optimization. Nowadays submodular functions optimization has been attracted by many researchers. Pendant pairs of a symmetric...
متن کاملUsing a Treebank for Finding Opposites
We present an automatic method for extraction of pairs of opposites (e.g. hotcold, top-bottom, buy-sell) by means of dependency patterns that are learned from a 450 million word treebank containing texts from Dutch newspapers. Using small sets of seed pairs, we identify the best patterns for finding new pairs of opposites. Treebanks are useful for generating dependency patterns expressing relat...
متن کاملFinding common structured patterns in linear graphs
A linear graph is a graph whose vertices are linearly ordered. This linear ordering allows pairs of disjoint edges to be either preceding (<), nesting (@) or crossing (G). Given a family of linear graphs, and a non-empty subset R ⊆ {<,@, G}, we are interested in the Maximum Common Structured Pattern (MCSP) problem: find a maximum size edgedisjoint graph, with edge pairs all comparable by one of...
متن کاملیک نگرش ترکیب سطوح برای تخمین ماتریس مبدأ و مقصد در شبکههای بزرگ مقیاس
Transportation problems are usually considered in large-scale networks, where finding the optimal solution of these problems is so time-consuming and costly. Therefore, a useful method to solve the large-scale network problems is dividing them into some smaller sub-problems. In this paper, for the first time, the origin-destination (o-d) matrix estimation problem is considered through a mixed p...
متن کامل